Skip to content

pd.Timestamp constructor ignores missing arguments #32683

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

pd.Timestamp constructor ignores missing arguments #32683

wants to merge 1 commit into from

Conversation

artsiomkaltovich
Copy link

Hello. The PR fixes #31930

Please review and merge if you think it is good enough :)

Regards,
Artyom.

@pep8speaks
Copy link

Hello @ArtyomKaltovich! Thanks for opening this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 446:89: E501 line too long (119 > 88 characters)

@artsiomkaltovich
Copy link
Author

Some tests failed due to changes, mostly pickle ones.
I think that tz should be key only parameter (btw, usually it is used in such way), but it causes failures.
What should I do? Change or not to change?)

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#31930 is more about raising when illegal combinations of arguments are specified together (not just tz)

I think there needs to be more discussion on the issue before we move forward with a solution.

@artsiomkaltovich
Copy link
Author

artsiomkaltovich commented Mar 13, 2020

#31930 is more about raising when illegal combinations of arguments are specified together (not just tz)

I think there needs to be more discussion on the issue before we move forward with a solution.

Hello @mroeschke
Thank you for your feedback.

I've just faced with this bug recently at my job and decided to fix it. But you are right, it is more connected to the issue #31929

What if the other prohibited combinations? I can name only this one and ts_input with year, month and day specified via key params. I can add this check if it is not yet presented. Are there any more cases? functools.singledispatch could help us here, but it will probably slow down the execution. So I guess there is no better way here except the current one (I mean isinstance checking it the constructor body). Don't you agree?

If you need a little bit more time to think about refactoring, can we for now submit it as a fix for #31929 ? Because, as I said previously it prevents me to implement work tasks and it is unexpected behavior what is good to avoid.

I've just need to know if changing tz to key only param is acceptable or not?

Thank you and sorry for mentioning the wrong issue :)

Regards,
Artyom.

@mroeschke
Copy link
Member

I proposed #31930 (comment) as the only legal combination of parameters for #31930. Probably needs more discussion on that front.

As for #31929 which you are trying to address, rearranging the arguments in the constructor would be an API change and would need to undergo a deprecation cycle. The easier solution in my option would be to deprecate the tz argument entirely (which would still require a deprecation cycle)

@artsiomkaltovich
Copy link
Author

Ok, I see your point.

So for now, I can try to bypass the bug without changing api. That will fix the problem and #31929 could be closed, while we can still think about better approach for #31930

@artsiomkaltovich
Copy link
Author

artsiomkaltovich commented Mar 13, 2020

Actually singledispatch works with functions with different amount of parameters.

from datetime import datetime
from functools import singledispatch


@singledispatch
def Timestamp(tsinput):
    raise NotImplementedError('Unsupported type')


@Timestamp.register(int)
def _(year, month, day):
    print(f"Timestamp({year}, {month}, {day})")


@Timestamp.register(str)
def _(ts_info):
    print(f"Timestamp({ts_info})")


@Timestamp.register(datetime)
def _(ts_info):
    print(f"Timestamp({ts_info})")

if __name__ == '__main__':
    Timestamp(2020, 2, 2)  # if omit the last arg  
                           # TypeError: _() missing 1 required positional argument: 'day'
                           # will be generated.
    Timestamp("2019-11-11")
    Timestamp(datetime(2017, 3, 4))

produces

Timestamp(2020, 2, 2)
Timestamp(2019-11-11)
Timestamp(2017-03-04 00:00:00)

but of course more research of performance, function wrupping and doc generation are needed.

@mroeschke
Copy link
Member

Do you want to try to address #31930 with singledispatch in this PR @ArtyomKaltovich?

@artsiomkaltovich
Copy link
Author

Do you want to try to address #31930 with singledispatch in this PR @ArtyomKaltovich?

If it is acceptable variant, why not?)

But I am not familiar with Pandas CI performance testing, where can I read/know about it?

@jreback jreback changed the title fix issue 31930 pd.Timestamp constructor ignores missing arguments Mar 15, 2020
@jreback jreback added Error Reporting Incorrect or improved errors from pandas Datetime Datetime data dtype labels Mar 15, 2020
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think we can use singledispatch in cython (well we can but its going to be non-performant); you can try to prove me wrong though.

In any event, we simply want to validate combinations of parameters and reject invalid ones.

*,
tz=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we cannot change the position of these args

@artsiomkaltovich
Copy link
Author

i don't think we can use singledispatch in cython (well we can but its going to be non-performant); you can try to prove me wrong though.

In any event, we simply want to validate combinations of parameters and reject invalid ones.

Hello @jreback
Thank you for your feedback.
I am also afraid of performance issue, but maybe singledispatch create the same code with isinstance as already presented one. That's why I've asked how can I measure it with Pandas performance testing?

we cannot change the position of these args
Thank you I will revert parameter order. I can submit a fix for #31929 independently while we a thinking about better parameter validation approach.

@artsiomkaltovich
Copy link
Author

Hello @mroeschke @jreback
I was trying to leave param order as is. The following code works:

    stamp = Timestamp(2019, 1, 2, 3, 4, 5, 6, 7, tzinfo=pytz.timezone("UTC"))
    stamp = Timestamp(year=2019, month=1, day=2, hour=3, minute=4, second=5, microsecond=6, nanosecond=7, tz="UTC")

but it still has problems with such code:

        stamp = Timestamp(2019, 1, 1, tz="EST5EDT")
        E   TypeError: __new__() got multiple values for keyword argument 'tz'

Because tz is the third arg and 1 is passed as tz, not day.

I can create PR, because it fixes at least one problem, or we can continue to experiment, but I do not see any ways to fix it except singledispatch or parameter reordering :(

Regards.

@mroeschke
Copy link
Member

In terms of performance testing see: https://pandas.pydata.org/pandas-docs/stable/development/contributing.html#running-the-performance-test-suite

I think these issues can be solved together by either reordering parameters or using singledispatch

@artsiomkaltovich
Copy link
Author

Thank you @mroeschke
I will investigate it.

@mroeschke
Copy link
Member

@ArtyomKaltovich do you have time to merge in master and address the review comments?

@artsiomkaltovich
Copy link
Author

Hello @mroeschke
Unfortunately I didn't have enough time to work on it due to work and study, I am planing to do it on this week.

Regards.

@jreback jreback added this to the No action milestone Jul 17, 2020
@jreback
Copy link
Contributor

jreback commented Jul 17, 2020

closing as stale
if u want to continue pls ping

@jreback jreback closed this Jul 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Error Reporting Incorrect or improved errors from pandas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pd.Timestamp constructor ignores missing arguments
4 participants